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Abstract 

Background: The genus Pyrus belongs to the tribe Pyreae (the former subfamily Maloideae) of the family 
Rosaceae, and includes one of the most important commercial fruit crops, pear. The phylogeny of Pyrus has not 
been definitively reconstructed. In our previous efforts, the internal transcribed spacer region (ITS) revealed a poorly 
resolved phylogeny due to non-concerted evolution of nrDNA arrays. Therefore, introns of low copy nuclear genes 
(LCNG) are explored here for improved resolution. However, paralogs and lineage sorting are still two challenges 
for applying LCNGs in phylogenetic studies, and at least two independent nuclear loci should be compared. In this 
work the second intron of LEAFY and the alcohol dehydrogenase gene (Adh) were selected to investigate their 
molecular evolution and phylogenetic utility. 

Results: DNA sequence analyses revealed a complex ortholog and paralog structure of Adh genes in Pyrus and 
Malus, the pears and apples. Comparisons between sequences from RT-PCR and genomic PCR indicate that some 
Adh homologs are putatively nonfunctional. A partial region of Adh 1 was sequenced for 18 Pyrus species and three 
subparalogs representing Adhl-1 were identified. These led to poorly resolved phylogenies due to low sequence 
divergence and the inclusion of putative recombinants. For the second intron of LEAFY, multiple inparalogs were 
discovered for both LFYlint2 and LFY2int2. LFYlint2 is inadequate for phylogenetic analysis due to lineage sorting 
of two inparalogs. LFY2int2-N, however, showed a relatively high sequence divergence and led to the best-resolved 
phylogeny. This study documents the coexistence of outparalogs and inparalogs, and lineage sorting of these 
paralogs and orthologous copies. It reveals putative recombinants that can lead to incorrect phylogenetic 
inferences, and presents an improved phylogenetic resolution of Pyrus using LFY2int2-N. 

Conclusions: Our study represents the first phylogenetic analyses based on LCNGs in Pyrus. Ancient and recent 
duplications lead to a complex structure of Adh outparalogs and inparalogs in Pyrus and Malus, resulting in 
neofunctionalization, nonfunctionalization and possible subfunctionalization. Among all investigated orthologs, 
LFY2int2-N is the best nuclear marker for phylogenetic reconstruction of Pyrus due to suitable sequence divergence 
and the absence of lineage sorting. 



Background 

The genus Pyrus L. belongs to the tribe Pyreae BailL 
(the former subfamily Maloideae C. Weber) of the 
family Rosaceae [1] and is geographically divided into 
two groups: occidental pears and oriental pears [2]. The 
majority of oriental pears are native to China; a few are 
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native to Japan and the Korean Peninsula. Chinese taxo- 
nomists agreed on 13 Pyrus species native to China, 
among which P. betulaefolia Bge. and P. calleryana 
Dene, have retained characteristics believed to be ances- 
tral for Pyrus [3] including the smallest fruit size and 
lowest carpel number. Based on morphological traits or 
crossing experiments, P. x bretschneideri Rehd., P. x ser- 
rulata Rehd., P. x sinkiangensis Yu, and P. x hopeiensis 
Yu are putative hybrids among other Pyrus species 
[4-6]. The circumscription of species, subspecies, and 
forms for occidental species remains controversial. It is 
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believed that some cultivated pears frequently escaped 
from cultivation and became feral. These plants hybri- 
dize easily, both with cultivated and wild species, result- 
ing in a number of intermediate forms and segregants 
[7]. Therefore, morphological characters are poor indi- 
cators of Pyrus phylogeny. Other data sources, like che- 
mical characters, were used to distinguish some pear 
species [8], but these were plagued by low number of 
characters, polymorphisms, and environmental plasticity. 
During the last decade, molecular markers including 
RFLPs [9], RAPDs [10-12], genomic-SSRs [13,14], EST- 
SSRs [15,16] and AFLPs [10,17] have been applied in 
Pyrus. These data provided useful information on the 
origin of some cultivated pear groups, e.g. Chinese 
white pears (CWP), which are assigned to P. x bretsch- 
neideri. However, CWP are morphologically different 
from the so-called wild P. x bretschneideri in northern 
Hebei province [4]. They also show a close relationship 
to P. pyrifolia based on multiple molecular marker data 
and thus were treated as P. pyrifolia White Pear Group 
[11,14,17]. However, most of the studies have focused 
on the relationships of several oriental or occidental spe- 
cies or cultivar pear groups, and the phylogeny of the 
genus remains unresolved. 

Plastid DNA sequence data and the internal tran- 
scribed spacer (ITS) of nuclear ribosomal DNA 
(nrDNA) have been used for plant phylogenetic recon- 
struction due to ease of amplification. In the Rosaceae, 
these data have been applied in phylogenetic studies at 
different taxonomic levels [18,19], but their utility is 
limited due to varied evolutionary rates [20-25]. Intra- 
individual ITS polymorphisms caused by incomplete 
concerted evolution of nrDNA arrays have been found 
in many Rosaceae [20,21,26,27]. Such polymorphisms 
provided evidence for hybrid origins of some species in 
Rosa [27], but they led to a poorly resolved phylogeny in 
Malus [21]. Similarly, our previous study in Pyrus [28] 
revealed a history of non-concerted evolution of ITS 
and a poorly resolved phylogenetic tree. Six non-coding 
regions of plastid DNA were found to be highly con- 
served in Pyrus [29], but they reflect only the maternal 
genealogies. 

An alternative source of molecular sequence data, 
low-copy nuclear genes (LCNGs), has proven to be 
more phylogenetically informative than either ITS or 
plastid DNA [30]. These genes reflect biparental lineages 
and are less prone to homogenization [31-33]. Due to 
the accumulation of large number of gene sequences in 
GenBank, it is now possible to design taxa-specific pri- 
mers. However, paralog and lineage sorting problems 
are still challenges to applying LCNGs to phylogenetic 
studies, since they may lead to topological incongruence 
similar to those caused by hybridization [31,34]. Gene 
duplication is a prominent feature of plant genome 



evolution, and duplicate segments account for 60% of 
the Arabidopsis thaliana genome [35]. In molecular 
phylogenetic studies, nuclear genes undergoing gene 
duplications or the birth-and-death process lead to pro- 
blems in the identification of orthologs and paralogs 
and discordance between gene and species trees. Addi- 
tionally, frequent gene duplications made the terms 
'paralogy' and 'orthology' ambiguous. Thus the new 
terms 'inparalog' for paralogs that evolved after the 
ingroup speciation and 'outparalog' for those that 
evolved before ingroup speciation occurred [36]. Lineage 
sorting (or deep coalescence) results from random fixa- 
tion of ancestral polymorphic alleles, which may induce 
similar topological incongruence to that of hybridization, 
and poses the most challenging problems for inter- and 
intra-specific phylogenetic inference [31,37]. However, 
lineage sorting is a random process, and fixation of 
ancestral alleles among species is rarely identical for two 
unlinked nuclear loci. Therefore, incongruence caused 
by hybridization and lineage sorting could be differen- 
tiated when comparing phylogenies based on multiple 
unlinked nuclear loci. 

LCNGs that succeeded in other Rosaceae are poten- 
tially ideal nuclear markers for phylogenetic studies of 
Pyrus. The coding region of GBSSI has been successfully 
applied at intergeneric and higher levels [26,38], but the 
introns are too short and dispersed to be ideal gene 
regions for interspecific levels studies. Another gene 
region is the second intron of LEAFY, which is long 
enough and has been proven to be informative for stu- 
dies at the interspecific level in Pyreae [39,40]. Complete 
coding sequences of two LEAFY loci have been isolated 
in Pyrus [41], and the corresponding genomic sequences 
in Malus species are available (DQ535885-AFL7, 
DQ535886-AFL2). Thus obtaining introns of LEAFY in 
the Pyrus taxa is possible. 

Alcohol dehydrogenase gene (Adh) is one of the best- 
studied nuclear-encoded genes in plants. Two major 
ADH classes, class P with alcohol activity and class III 
with glutathione-dependent formaldehyde activity, have 
been identified in flowering dicot or monocot plants. 
The former is common for plants and the latter has 
been isolated in a few taxa including Pisum sativum 
(P80572) [42], Oryza sativa (U77637) and Araobidopsis 
thaliana (X82647) [43]. The Adh gene occurs in small 
gene families, and has proven to be a useful phyloge- 
netic marker in the Poaceae and Paeoniaceae [44-46], 
but it is too complex a gene family or provided little 
phylogenetic resolution in other taxa such as the Gossy- 
pium and Carex [47,48]. Two distinct Adh loci 
(AF031900, AF031899) have been isolated in P. commu- 
nis 'Packham's Triumph' [49], thus it is possible to iso- 
late Adh genes among Pyrus species. However, the only 
genomic Adh sequence from a species of Rosaceae is 
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that from Fragaria ananassa (X15588) in GenBank [50], 
and the exon/intron structure is unknown in Pyrus. The 
phylogenetic utility of Adh coding regions and the 
intron region have not been determined in any Rosaceae 
taxa. 

Since no LCNG analysis had been applied to phyloge- 
netic studies of Pyrus, we explored the utility of LEAFY 
and Adh. In this study, a comparison of genomic and 
RT-PCR-based approaches yielded an initial description 
of the composition and functionality of the Adh gene 
family in Pyrus. The phylogenetic utility of Adh gene 
regions and the second intron of LEAFY were deter- 
mined after examining the sequence divergence, gene 
duplications, lineage sorting and recombination. 

Malus taxa were once assigned to Pyrus, but Miller 
treated Malus Mill, as a separate genus in 1768 due to 
graft incompatibility between the two [51]. Malus taxa 
originated before Pyrus taxa according to the fossil 
occurrence [52], and are here used as outgroups 
(Table 1). 

Results 

Gene structure and paralog identification based on long 
Adh sequences 

A total of 17 Adhl and eight Adh2 long partial 
sequences were obtained by Genomic-PCR (G-PCR) 
using different primer sets (Table 2) in 'Cuiguan' (P. 
pyrifolia), 'Nanguoli' {P. ussuriensis), 'Korlaxiangli' (P. 
sinkiangensis), 'Flemish Beauty' (P. communis), M. rockii, 
M. domestica subsp. chinensis and 'Rails' (M. domestica). 
All of these Adh genes encoded medium-chain ADH 
enzymes with 380 amino acid residues. After phyloge- 
netic analyses, gene structure and sequence divergence 
comparisons, two paralogs representing Adhl {Adhl-1, 
Adhl -2) and Adh2 {Adh2-1, Adh2-2) were identified. 

As shown in Figure 1, Adh2-1 had a classical Adh 
gene structure with nine introns similar to Zea mays, 
Fragaria ananassa, and other characterized plant Adh 
genes [53]. Exon and intron codes described in this 
study were named following the classical gene structure 
to avoid confusion. Adh2-2 had lost intron 4, while both 
Adhl-1 and Adhl-2 had lost intron 7. Adhl-1 and 
Adhl-2 have the same gene structure, but the former 
was obtained by downstream primer Adhl-Rl located in 
the 3'UTR region, while the latter was obtained only by 
downstream Adhl-R2 located in the last exon (Figure 
1). This indicates that the 3' UTR region of Adhl-2 may 
be divergent from that of Adhl-1. However, we were 
unable to obtain the 3'UTR region of Adhl-2. Adhl-2 
obtained in 'Cuiguan' (P. pyrifolia) displayed a 20-bp 
deletion in exon 4. One of the three Adhl-2 clones in 
Korlaxiangli' (P. sinkiangensis) and one of the three 
Adhl-2 clones in 'Rails' (M. domestica) displayed one or 
two 1-bp deletions, respectively, in exonic regions. A 



stop codon occurred in the exonic region in one of the 
two Adh2-2 sequences of M. rockii. These sequences 
were deemed putative pseudogenes and were removed 
from subsequent sequence analyses. 

Among these four Adh paralogs {Adhl-1, Adhl-2, 
Adh2-1 and Adh2-2), only the intron regions of Adhl-1 
and Adhl-2 could be aligned. Therefore, only the coding 
regions were used for nucleotide sequence divergence 
(NSD) and amino acid sequence divergence (ASD) com- 
parisons. As shown in Table 3 NSD between Adhl and 
Adh2 paralogs was as high as 0.3, while that within each 
paralog was lower than 0.02. NSD between Adh2-1 and 
Adh2-2 (0.19) was much greater than that between 
Adhl-1 and Adhl-2 (0.06). NSD of Adhl-1 and Adhl-2 
between Pyrus and Malus were 0.027 and 0.035, respec- 
tively. All of these sequence divergence comparisons 
were consistent with the consequent phylogenetic infer- 
ences, indicating that our identification of paralogs was 
accurate. In most cases, NSD was greater than ASD 
between different homologs, while NSD was less than 
ASD within each homolog. 

Both maximum parsimony (MP) (Additional file 1) 
and neighbor joining (NJ) (Figure 2) trees based on 
amino acid sequences of Adh genes from different plant 
taxa displayed similar topologies. Class III ADH formed 
clades separate from the putative class P ADH. The Adh 
genes from the Solanaceae, Brassicaceae, Anacardiaceae, 
and Fabaceae were monophyletic with high bootstrap 
values, suggesting that duplication events leading to 
these Adh genes occurred independently after diversifi- 
cation of these plant families. The Adh genes from 
Paeoniaceae and Poaceae also formed monophyletic 
clades; however, with low bootstrap values. Different 
Adh genes in Malus and Pyrus (Rosaceae) were not 
monophyletic. Adhl-1 and Adhl-2 in Malus and Pyrus 
formed two sister clades with Fragaria ananassa as 
their sister clade, suggesting that gene duplication lead- 
ing to these two outparalogs occurred prior to diversifi- 
cation of Malus and Pyrus. Adh2-1 and Adh2-2 in 
Malus and Pyrus formed another clade. Due to lack of 
related sequences in other Rosaceae taxa, we cannot 
infer their origin. Among investigated families, Fabaceae 
is the most closely related to Rosaceae, but their Adh 
genes did not show a close relationship, indicating that 
the Adh genes have become highly diversified within 
each family. 

Phylogenetic analyses based on reduced Adhl sequences 

Only a short region containing intron 2 and intron 3 of 
Adhl (reduced Adhl) were sequenced from more Pyrus 
species to investigate the phylogenetic utility of this 
region. Adhl-1 was preferentially amplified and only 
two Adhl-2 sequences were obtained in P. amygalifor- 
mis and 'Dangshansuli' (P. pyrifolia, CWP). Three 
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Table 1 Plant taxa used in this study and subparalogs of LFY1int2 r LFY2int2 and Adhl-1 recovered in each accession 

Accessions a Species Origin Leaf source b Subparalogs or copy types c 



LFY1int2 LFV2int2 Adhl- 
7 



'Korlaxiangli' 


P. x sinkiongensis Yu 


Xinjiang, China 


CPGR 


a, b 


N 


a, b, c 


'Cuiguan' 


P. pyrifolia 


Cross 


Zhejiang 
University 


1 


1 


1 


'Nanguoli' 


P. ussuriensis 


Liaoning Province, China 


TU 


a 


N, Ins8 


a, b, c 


'Flemish Beauty' 


P. communis 


Belgium 


ZZFI 


1 


1 


1 


'Fuji' 


M. domesitco 






1 


1 


1 


'Chojuro' 


P. pyrifolia Nakai 


Kanagawa Pref. Japan 


TU 


a 


N 


a, b 


'Nijisseiki' 


P. pyrifolia 


Chiba Pref. Japan 


TU 


b 


N 


a, b, c 


'Yali' 


P. pyrifolia White pear group 


Hebei Province, China 


TU 


a, b 


Del2, 5 


a 


'Dangshansuli' 


P. pyrifolia White pear group 


Anhui Province, China 


ZZFI 


a, b 


/V, Del2 


a, b, c 


'Yaguang' 


P. ussuriensis Maxim. 


Liaoning Province, China 


CPGR 


a 


N, S 


a, b 


'Jianbali' 


P. ussuriensis 


Liaoning Province, China 


I U 


a 


M C 

N, j 


a, b, c 


P. pashia 1 


P. pashia D.Don 


Yunnan Province, China 


l_l DIVK I 

HKIYN 


Irs 
0 


N 


a, c 


P. pashio 2 


P. pashia 


Yunnan Province, China 


HRIYN 


b 


N 


a, b, c 


P. dimorphophyllo 


P. dimorphophylla Makino 


Mie Pref. Japan 


TU 


a, b 


/V, 5 


a, b 


P. colleryona 


P. calleryana Dene. 


South China 


HRIYN 


a, b 


/V, Del2 


a, c 


P. fauriei 


P. fauriei Schneid. 


Korea 


TU 


a, b 


S 


a, b 


P. betulaefolia 


P. betulaefolia Bge. 


Gansu Province, China 


CPGR 


b 


N 


a, c 


P. x serruloto 


P. x serrulata Rehd. 


Hubei Province, China 


CPGR 


a, b 


N 


a, c 


P. xerophila 


P. xerophila Yu 


Gansu Province, China 


GPI 


b 


N 


a, b 


P. x hopeiensis 


P. x hopeiensis Yu 


Hebei Province, China 


Hebei Province, 
China 


a 


N, Ins8 


a, b 


P. x phaeocorpo 


P. x phaeocarpa Rehd. 


North China 


CPGR 


a 


N 


a, b, c 


P. hondoensis 


Nakai & Kikuchi 


Middle Japan 


TU 


a 


N, Ins8 


a, b, c 


P. communis 


P. communis L. 


Europe 


TU 


b 


21-bp 
deletion 


a, b, c 


P. elaeagrifolio 


P. elaeagri folia Pall. 


Turkey, Crimea, South East 
Europe 


TU 


b, c 


N 


a, c 


P. omygdaliformis 


P. amygdaliformis Vill. 


Mediterranean area, South 
Europe 


TU 


b, c 


N 


a, c 


P. cossonii 


P. cossonii Rehder. 


Algeria 


TU 


b 


N 


a, b 


outgroup 














M. sieboldii 


M. sieboldii (Regel.) Rehd 


Yunnan Province, China 


HRIYN 


1 


N 


a, b, c 


M. rockii 


M. rockii Schneid. 


Yunnan Province, China 


HRIYN 


1 


1 


b, c 


M. domestica subsp. 
chinensis 


M. domestica subsp. chinensis Li Y. 
N.-(Nai) 


North China 




1 


1 


a, b, c 


M. neidzwetzkyana 


M. neidzwetzkyana (Dieck) Langenf. 


Xinjiang (Uygur Autonomous 
Region) 




1 


1 


b, c 


'Rail' 


M. domestica Borkh 






1 


1 


1 



a Accessions used for RT-PCR are in bold. b TU: Tottori University, Japan; CPGR: China Pear Germplasm Repository, Xingcheng, Liaoning Province; ZZFI: Zhengzhou 
Fruit Institute, Chinese Academy of Agriculture Science, Zhengzhou, Henan Province, China. GPI: Gansu Pomology Institute, Gansu Academy of Agricultural 
Science, Gansu Province, China; HRIYN: Horticultural Research Institute, Yunnan Academy of Agricultural Sciences, Kunming, Yunnan Province, China. c Different 
copy types of LFY1 and LFY2 identified by indels and phylogenetic analyses. 



subparalogs, Adhl-la, Adhl-lb, Adhl-lc, were sup- 
ported by the tree topologies (Figure 3). These were 
outparalogs that occurred before Malus and Pyrus 
diversification. Only Adhl-lc was obtained in all Pyrus 
accessions, while all of the three Adhl-1 outparalogs 
were derived in the same nine accessions. Multiple 
intra-individual polymorphic sequences representing one 
Adhl-1 outparalog could be recovered in some 



individuals. For example, five Adhl-lc sequences of P. x 
hopeiensis displaying autapomorphic mutations were 
monophyletic in the tree, but only one sequence was 
retained in the final dataset. Based on our previous 
experience, such small mutations were more probably 
caused by Taq polymerase errors during cloning and 
PCR. Among these subparalogs, the length of intron 2 
varied from 228 to 262 bp while that of intron 3 varied 
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Table 2 Primers used for PCR amplification and 
sequencing in this study 



Target region 



Primer sequence (5'-3') 



Adh Long partial region 



Internal primer 
reduced Ad hi 
Specific primers 



Ad h 1 -F 1 : ATGTCTA ATACTGCTGGTC A 
Adh1-F2: TGATGmAOTCTGGGAGG 
Adhl-RI: GATOAATOTGTOmA 
Adh1-R2: TGTGGATOTGCAACGAAGA 
Adh2-F: TGTOACTTCTGGGATGCCAA 
Adh2-R: ATGCTAACGATGCACCGCAA 

Adh1-F5: AGGAGAATGCAAGGACTGCGCT 
Adh2-F5: CATOCAAGTCTGAGGAAAG 

Adh1-F2 

Adh1-R3: CAAAATGGTAGATAGGOT 



spAdh1-F: 
spAdh1-R: 

spAdh2-F; 
spAdh2-R: 



TCTACCA^GTOGGACT 
AACGOTCCTGTACATOAA 

GATOATCAOTCCTCGGCA 
TAATATAGCCGGTGCACTCT 



LEAFY Long partial 
LFY1int2 
LFY2int2 
LFY2s 



LFY-F: TGTCGGAGGAGCCAGTGCAA 
LFY-R: GGCGTAGCAGTGCACATAGT 

LFY1-F: TGGACGTOATCAATAAAGA 
LFY1-R: AGTCGAACTAAATAGTOAA 

LFY2-F: GTGGGCCCAmCCTGTAGT 
LFY2-R: G™AATCCGGTCAGA™T 

LFY2S-F: CTGTATOACTAmCTGTC 



MLFY2-F: CGTACGCTOmCTACTGCA 
LFY2-R 



Actin Partial 



Pact-F: CCATCCAGGCTGTOTCTC 
Pact-R: GCAAGGTCCAGACGAAGG 



from 114 to 117 bp, and subparalog associated indels 
were observed (data not shown). A string of thymine 
residues from 8 bp to 22 bp among Adhl-lc copies 
made alignment difficult, thus this region was removed 
from phylogenetic analyses. Two sequences possessing 
characteristics (indels and substitutions) of different 
Adhl-1 outparalogs were identified as recombinants by 
Recombination Detection Program (RDP) [54], and 13 
more similar putative recombinants were identified 
manually by observing their conflicting positions within 
the alignment. These putative recombinants were prob- 
ably artificial products created during PCR and were 
excluded from all analyses. 

The final dataset contained 101 Adhl-1 and six Adhl- 
2 copies with an aligned length of 677 bp. The NSD 
within each Adhl-1 subparalog was very low, ranging 
from 0.011 to 0.013 (excluding Malus accessions) (Table 
4). Adhl-2 from 'Rails' (M. domestica) was selected as 
the outgroup for phylogenetic analyses. MP (data not 
shown) and Bayesian analyses (Figure 3) of this dataset 
resulted in similar phylogenetic trees with little differ- 
ence in support values (the Bayesian posterior probabil- 
ities are generally higher than bootstrap percentage). 
Adhl-1 and Adhl-2 formed two separate clades. Within 
the Adhl-1 clade, Adhl-la and Adhl-lb were closely 
related and formed a sister clade to Adhl-lc. Adhl-lc 
was obtained in all accessions, but the relationships 



were poorly resolved with extensive polytomies. The 
occidental species, P. communis, P. amygdaliformis, P. 
elaeagrifolia and P. cossonii, were not monophyletic. 
Most intra-individual polymorphic sequences were poly- 
phyletic, e.g. P. calleryana, P. xerophila, 'Nijisseiki' (P. 
pyrifolia) and 'Korlaxiangli' (P. sinkiangenesis) . Adhl-a 
and Adhl-b were only obtained in some accessions, thus 
the phylogenetic relationships in these two clades were 
incomplete, but the occidental species were monophy- 
letic in both clades. 

Transcription of Adh homologs 

Specific genomic PCR (SG-PCR) and RT-PCR using 
locus specific primers produced expected bands in the 
genomic DNA and cDNA samples. Both Adhl and 
Adh2 were transcribed in all investigated tissues and 
cultivars (Figure 4). To investigate the transcription of 
different Adh homologs, phylogenetic analyses including 
sequences derived from SG-PCR, G-PCR and RT-PCR 
were conducted for Adhl and Adh2 separately, and the 
putative pseudogenes identified above were also 
included to enhance the findings on functionality of 
Adh homologs. 

As indicated in Additional file 2 the transcription of 
an Adhl-2 copy was not observed. We speculate that 
Adhl-2 is a nonfunctional outparalog, since putative 
pseudogenes have been identified and its 3'UTR region 
is divergent from Adhl-1 as described above. Among 
the Adhl-1 outparalogs the most frequently cloned 
Adhl-lc was not recovered by RT-PCR, indicating that 
it was a degenerate outparalog. This also explains its 
preferential amplification. Transcription of the other 
Adhl-1 outparalogs was detected among different tis- 
sues or cultivars. Transcription of Adh2-1 could be 
detected in all investigated tissues and cultivars. An 
Adh2-1 sequence was not obtained in the three Malus 
accessions by G-PCR, but its transcription was found in 
'Fuji' (M. domestica) (Additional file 3). Adh2-2 was 
only observed in Malus taxa, and transcription of this 
paralog was not detected. We could not deduce distinct 
tissue-specific or cultivar-specific expression for either 
Adhl-1 or Adh2-1. Several anomalous Adh2-1 copies 
with only intron 6 were recovered by cloning of RT- 
PCR products in 'Flemish Beauty' (P. communis) and 
'Nanguoli' (P. ussuriensis) (HQ912054, HQ912055 and 
HQ912056). Since intron-containing cDNA has not 
been reported, those copies may be amplified due to 
genomic DNA contamination and indicate existence of 
an additional Adh2 paralog with more intron loss in 
Pyrus. 

Sequence variation and paralogs of LEAFY 

Twenty-six LFY1 and 27 LFY2 sequences including a 
partial exon 2 were obtained. Four groups with different 
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Adh2-F 
Adhl-Fl Adhl-F2 



spAdh2-F 
spAdhl-F 

Adfrl-R3 

exon4 exon 5 



spAdh2-R 

s^Adhl-R Adh2 _ R 
^" Adhl-R2 Adhl-Rl 



exon7/exon8 



3'UTR 



Adh1-1Adh1-2 



Adh2-2 



HI — BKZB-O-O-OHZl-O- 
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Figure 1 Schematic diagram of Adh and LEAFY genes. Open boxes represent exons, and connecting lines represent introns. Combinations of 
neighboring exons show the loss of intron 4 in Adh2-2 and the loss of intron 7 in Adhl-1 and Adhl-2. Arrows indicate the locations and 
directions of primers used for PCR amplification. Rows a, b and c are different Adh genes from Molus and Pyrus obtained in this study; Row d is 
the genomic Adh sequence from Frogario (X15588); Row e is Adhl-F from Zeo (AF050457); Row f is LEAFY from Molus and Pyrus. For Rows b and 
c, the exon 1 and intron 1 of Adh2-1 and Adh2-2 were not amplified by primers Adh2-F and Adh2-R, and the empty boxes indicate incomplete 
exon 2 and exon 10. 



length variations in the exon region were observed: 
LFYl-Malus (407 bp), LFYl-Pyrus (395 bp), LFY2-Malus 
(410 bp) and LFY2-Pyrus (401 bp), which were congru- 
ent with those from RT-PCR in P. pyrifolia 'Housui' 
CLF77-AB162029, LFY2-AB162035) and M. domestica 



'Fuji' CLFY7-AB162028 LFY2-AB 162034), respectively. 
The length of these indels only had an effect on the 
length of the deduced amino acid sequences. Among 
Pyrus accessions, NSD of this partial exon 2 between 
LFY1 and LFY2 was relatively high (0.076), while that 



Table 3 Sequence divergence (mean value) of the coding regions between and within Adh homologs (excluding 
putative pseudogenes) 





Pyrus Adhl-2 (3) c 


Malus Adhl-2 (1) 


Pyrus Adhl-1 (6) 


Malus Adhl-1 (A) 


Malus Adh2-2 (3) 


Pyrus Adh2-1 (4) 




NSD a 


ASD b 


NSD 


ASD 


NSD 


ASD 


NSD 


ASD 


NSD ASD 


NSD ASD 


Pyrus Adhl-2 


0.01 5 d 


0.021 


/ 


/ 


0.016 


0.019 


0.016 


0.027 


0.017 0.017 


0.015 0.024 


Molus Adhl-2 


0.035 


0.027 


















Pyrus Adhl-1 


0.060 


0.047 


0.062 


0.046 














Molus Adhl-1 


0.058 


0.050 


0.062 


0.050 


0.027 


0.039 










Molus Adh2-2 


0.343 


0.235 


0.339 


0.226 


0.340 


0.243 


0.333 


0.235 






Pyrus Adh2-1 


0.322 


0.185 


0.318 


0.191 


0.321 


0.191 


0.318 


0.190 


0.197 0.159 





J Sequence divergence based on nucleotide sequence (NSD) 
3 Sequence divergence based on amino acid sequence (ASD) 

: The numbers in the parentheses indicate the number of sequences, and both NSD and ASD in the table represent mean values. 
d Sequence divergence within each homolog is highlighted in bold. 
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Figure 2 Neighbor-joining (NJ) tree based on amino acid sequences of Ad h loci from diverse plant taxa. ADH sequences in Rosaceae are 
highlighted in blue. Numbers above the branches or near the branch nodes indicate bootstrap values (1000 replicates). GenBank accession 
numbers are in brackets. Multiple intra-individual clones for Adhl {Adhl-1 and Adhl-2) and Adh2 {Adh2-1 and Adh2-2) are differentiated by the 
number in the parenthesis following the taxa name. *: Though Adh2-1 was not obtained by G-PCR in Malus, its transcription was detected by 
RT-PCR in 'Fuji' (M domestica). 
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Figure 3 Bayesian majority-rule consensus tree for reduced dc/M. Posterior probabilities and bootstrap values greater than 50 are provided 
above and below the branches, respectively. Outgroup accessions are highlighted in blue, while occidental species are in green. Multiple intra- 
individual clones for each of the three major clades {Adhl-la, Adhl-lb and Adhl-lc) are differentiated by the fraction in the parenthesis 
following the taxon names, and those within the Adhl-la clade are highlighted by different shapes and colors. The 'G' in the end of the taxa 
name indicates sequences obtained by G-PCR. 



Zheng et al. BMC Evolutionary Biology 201 1, 1 1:255 
http://www.biomedcentral.eom/1 471 -21 48/1 1/255 



Table 4 Sequence variations of Adhl-1 and LEAFY 
subparalogs in Pyrus (excluding the Malus accessions) 



Reduced Adhl The second intron of LEAFY 





Adhl-la 


Adhl-lb 


Adhl-lc 


LFY1int2 


LFY2int2 


N a 


37 


30 


17 


53 


43 


NSD b 


0.012 


0.013 


0.011 


0.019 


0.029 










(0.01 6) d 


(0.028) e 


PI c 


25/617 


21/633 


11/645 


61/653 


46/562 










(42/653) d 


(40/553) 6 



a The number (N) of sequences used for analyses. 

b Mean sequence divergence of nucleotide sequence (NSD) calculated by the 
K-2P method. 

c The number of parsimony-informative (PI) sites after alignment. 
d PI sites and NSD in brackets were calculated after excluding LFY1int2-c. 
e PI sites and NSD in brackets were calculated after excluding LFY2int2-lns8 
and LFY2int2-Del2. 

within LFY1 and LFY2 was low at 0.013 and 0.015, 
respectively (data not shown). 

The entire intron 2 of LFY1 among Pyrus species ran- 
ged from 774-783 bp, while that of LFY2 ranged from 
670 to 700 bp. LFYlint2 and LFY2int2 were amplified in 
all accessions using specific primers, but LFY2int2 of M. 
rockii and M. domestica subsp. chinensis were amplified 
by another forward primer MLFY2-F (Table 2) due to a 
large deletion (approximately 220 bp, GU991522 vs 
DQ535886) in these two accessions. LFY2int2 of all 
Malus accessions contained a 211-bp insertion, which 
made alignment difficult and was removed from the 
analyses. NSD within LFYlint2 and LFY2int2 among 
Pyrus accessions was 0.019 and 0.029, respectively 
(Table 4), while that between Malus and Pyrus was 
much higher at 0.057 for LFYlint2 and 0.066 for 
LFY2int2 (data not shown). 

Sequence variation and phylogenetic analyses of 
LFYlint2 suggest two subparalogs, LFYlint2-a and 
LFYlint2-b, among Pyrus species. Compared with 
LFYlint2-a, LFYlint2-b contains a 6-bp insertion. 
Recovery of LFYlint2-a and LFYlint2-b in each acces- 
sion is shown in Table 1. Among oriental species, NSD 
within LFYlint2-a was lower than that within LFYlint2- 
b for both ORF regions (0.002, 0.009) and intron regions 
(0.005, 0.013). Between LFYlint2-a and LFYlint2-b, 
NSD was 0.007 for the ORF region and 0.017 in for the 
intron region. For LFY2int2, subparalogs with an 8-bp 
insertion (LFY2int2-Ins8) and 2-bp deletion (LFY2int2- 
Del2) were only recovered in a few accessions. The 
common LFY2int2-N, with no indel, was recovered in 
all accessions but 'Yali' {P. pyrifolia, CWP) (Table 1). 
Coexistence of these subparalogs in one individual could 
be detected by direct sequencing due to the fixed posi- 
tion of indels. A minimum of three clones were 
sequenced. It was found that PCR or direct sequencing 
sometimes did not reflect the subparalogs existing in 
one genome, probably due to amplification preference 
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Figure 4 Genomic (SG)-PCR and RT-PCR of Adhl and Adhl. 

Bands of the molecular marker from the bottom up denote size 

standards of 0.1 kb, 0.2 kb, 0.5 kb, 0.75 kb, 1 kb and 2 kb. Row a is 

SG-PCR of Adhl and Adh2 in 'Cuiguan'(C1, C2), 'Korlaxiangli' (K1, K2), 

'Nanguoli' (N1, N2) and 'Flemish Beauty'(F1, F2); Rows b, c and d are 

RT-PCR of Adhl, Adh2 and control Actin gene in investigated tissues 

and cultivars. Cs, CI and Cf denote seed, leaf and fruit of 'Cuiguan', 

respectively; Kf, Nf, Ff and Mf-fruit denote fruit of 'Korlaxiangli', 

'Nanguoli', 'Flemish beauty' and 'Fuji', respectively. 
I _ J 



of different nuclear alleles. For example, direct sequen- 
cing of LFY2int2 in P. calleryana identified LFY2int2-N, 
but a LFY2int2-Del2 sequence was obtained by cloning. 

Several anomalous LFY2int2 copies were exclusively 
found in particular accessions. LFY2int2 from P. com- 
munis had a 21-bp deletion. LFY2int2 from P. fauriei 
had a 525-bp insertion that was partially homologous 
(reverse and complement) to the noncoding region of 
the S-RNase gene (AB308360), and was named 
LFY2int2-S. To eliminate the possibility of genetic 
recombination during PCR, all accessions were tested 
with an insert-specific upstream forward primer 'LFY2S- 
F' (Table 1) and the reverse primer LFY2-R. As a result, 
the LFY2int2-S was detected in 'Yali' {P. pyrifolia, 
CWP), 'Jianbali' (P. pyrifolia CWP), and 'Yaguang' (P. 
ussuriensis), but the sequences were not included in the 
phylogenetic analyses. LFY2int2-S in P. fauriei was still 
included in the dataset after the exclusion of its large 
insertion. 



Zheng et al. BMC Evolutionary Biology 201 1, 1 1 :255 
http://www.biomedcentral.com/1471-2148/11/255 



Page 1 0 of 1 9 



Phylogenetic analyses of LFY1int2 and LFY2int2 

Malus sieboldii and M. domestica (DQ535885-ALF1, 
DQ535886-ALF2) without the 220-bp deletion in 
LFY2int2 were used as outgroups in phylogenetic ana- 
lyses of LFYlint2 and LFY2int2. Putative recombinants 
were identified by RDP or by investigating abnormal 
substitution patterns and ambiguous alignment posi- 
tions. The putative recombinants displayed unique sub- 
stitutions of sequences from two distinct subclades and 
always formed well-separated clades in the tree, thus 
they were excluded from the final analyses (data not 
shown). A total of 57 LFYlint2 (four from Malus) and 
46 LFY2int2 (three from Malus) sequences were 
included in two separate datasets. Excluding sequences 
from Malus, the LFYlint2 dataset had an aligned length 
of 653 sites, of which 61 (9.3%) were parsimony infor- 
mative, while LFY2int2 had an aligned length of 562 
sites and 46 (8.2%) were parsimony informative (Table 
4). Similarly, only Bayesian trees were shown for both 
LFYlint2 and LFY2int2 datasets. 

As shown in Figure 5, LFYlint2-a formed a monophy- 
letic clade. The relationships within this clade were lar- 
gely unresolved; however, close relationships among P. 
x serrulata, P. calleryana and 'Yah' {P. pyrifolia, CWP) 
were resolved with high bootstrap support. Most of the 
subclades of LFYlint2-b were unresolved polytomies sis- 
ter to LFYlint2-a. The occidental species were not 
monophyletic, since three sequences from P. amygdali- 
formis and P. elaeagrifolia formed a highly supported 
independent clade. These three sequences, representing 
a paralog termed LFYlint2-c, displayed many unique 
variations and shared several substitutions with out- 
group accessions. LFYlint2-c was reamplified in these 
four occidental species, and could be frequently cloned 
in P. amygdaliformis and P. elaeagrifolia. The origin of 
LFYlint2-c is unclear. 

Relationships inferred by LFY2int2 were much better 
resolved, with higher support values and fewer poly- 
tomies (Figure 6), than those of LFYlint2., but the rela- 
tionships within most subclades were still unresolved. 
Two major clades were resolved. In clade I, the four 
occidental species and a clone of Korlaxiangli' (P. sin- 
kiangensis) formed a subclade sister to that of oriental 
pear including 'Yaguang' (P. ussuriensis), P. betulaefolia, 
P. x phaeocarpa and P. x hopeiensis. In clade II 
LFY2int2-Ins8 and LFY2int2-Del2 were monophyletic 
and mixed with LFY2int2-N sequences, thus they were 
inparalogs of recent origin. It is notable that the four 
LFY2int2-N sequences from P. xerophila were putative 
recombinants. They shared mutations with sequence 
from multiple subclades and also had unique mutations. 
Phylogenetic positions of these P. xerophila sequences 
were unresolved: P. xerophila (2/4) and P. xerophila (3/ 
4) were sisters and formed a separate clade, P. xerophila 



(4/4) also formed a separate clade, and P. xerophila (1/ 
4) shared mutations with occidental species and formed 
a separate subclade within clade I. It seems impossible 
that divergent copies in one genome are all recombi- 
nants, thus these sequences were included in the phylo- 
genetic analyses and are highlighted in bold in Figure 6. 
Similarly, the LFY2int2-S of P. fauriei formed an unre- 
solved separate clade. 

Discussion 

Frequency of Adh and LEAFY duplication 

Gene duplication plays an important role in increasing 
the diversity of gene function and expression, which can 
enable plants to colonize diverse habitats. Most mono- 
cots and dicots have at least two Adh genes, indicating 
that an initial Adh gene duplication occurred before the 
divergence of these plant taxa, and separate duplications 
have subsequently taken place [46]. Two major loci in 
Malus and Pyrus, Adhl and Adh2, are outparalogs 
derived from an ancient gene split. Based on an esti- 
mated 0.66% rate of nucleotide substitution per million 
year of Adh in Drosophila [55] and 0.2-0.3% in mamma- 
lian nuclear genes [56], the split occurred approximately 
50 million years ago. In maize Adhl and Adh2 share 
87% identity at the amino acid sequence level but are 
located on different chromosomes and differ in the level 
of tissue-specific expression [57]. The expression of 
three Adh genes with 85% and 87% shared amino acid 
identity in Vitis vinifera varied in developmental stage 
of grape berries and affinity to either ethanol or acetal- 
dehyde as a substrate [58]. ADH from apple had optimal 
acetaldehyde activity at pH 5.5-6.0 and ethanol activity 
at pH 7.0-10.0 [59]. Therefore, Adhl-1 and Adh2-1 in 
Pyrus (ASD = 0.170, Table 3) with less shared identity, 
likely also have diversified their expression patterns and 
substrate affinity. ADH plays an essential role in the bio- 
synthetic pathway of aroma volatiles in apple and pear 
fruits by reducing aldehydes to alcohols [60-62]. It will 
be interesting to determine functional divergence of 
Adh genes and kinetic properties of the corresponding 
ADH enzymes in Pyrus and Malus, 

Two paralogs representing Adhl {Adhl-1 and Adhl -2) 
and Adh2 (Adh2-1 and Adh2-2) were observed by G- 
PCR. Among these the Adh2-1 may be the most ances- 
tral since it has the classical nine-intron Adh structure, 
which is widely conserved among angiosperms and gym- 
nosperms [53]. Intron losses have been found in Adh 
genes of diverse taxa like Arabidopsis thaliana [63,64] 
and Mangifera indica [65]. In some species of Leaven- 
wortia, an expressed intronless Adh3 locus occurs and is 
thought to have arisen by an mRNA intermediate [66]. 
The single intron loss found in our study lends support 
to the 'intron exclusion hypothesis', which suggests that 
a single intron could be precisely removed by double 
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Figure 5 Bayesian majority-rule consensus trees for LFY1int2. Posterior probabilities and bootstrap values greater than 50 are provided 
above and below the branches, respectively. Outgroup accessions are highlighted in blue, while occidental species are in green. Multiple intra- 
individual clones are differentiated by the fraction in the parenthesis following the taxon names. Accessions possessing both LFY1int2-a and 
LFYlint2-b are marked by different shapes and colors. 



strand breaks from a multiple-intron gene [67]. Adhl-2, 
with the same gene structure as Adhl-1, was a putative 
nonfunctional outparalog derived by gene duplication 
(loss of intron 7, Figure 1). Gene duplication leading to 
Adhl-2, as inferred by the tree topology (Figure 2), 
occurred before Malus and Pyrus diverged, but probably 
after diversification of Pyreae taxa, since the orthologous 
Adh gene in Fragaria (Rosoideae) displayed the classical 
nine-intron structure. Additionally, the NSD between 
Adhl-1 and Adhl-2 in Pyrus was 0.06 (Table 3), which 
was much lower than that between Adhl-1 in Pyrus and 
Adh in Fragaria (0.10) (data not shown). However, the 
NSD in the coding region between these outparalogs 
was too low to confirm the non-functionality of Adhl-2, 
Adh2-1 was not obtained in Malus accessions by G- 
PCR, probably due to an amplification preference for 
Adh2-2, but its transcription was detected in 'Fuji' 



(Additional file 3). Adh2-2 was recovered neither in 
Pyrus by G-PCR and SG-PCR nor in Malus or Pyrus by 
RT-PCR. Compared with Adhl-1 and Adhl-2, the ASD 
between Adh2-1 and Adh2-2 was much higher (Table 3) 
and exon/intron structure also varied (Figure 1). Due to 
the lack of highly homologous Adh2 sequences from 
other Rosaceae taxa, is the origin of Adh2-2 is uncertain. 
It may be a duplicated inparalog derived from Adh2-1 
and restricted to Malus taxa or a functional outparalog 
that appeared before Malus and Pyrus diverged and was 
subsequently lost during diversification of Pyrus species. 
The latter theory is similar to the paralog sorting of 
RPB1 and RPB2 in different core eudicots taxa [68]. 
Specific RT-PCR for additional tissues and specific G- 
PCR for more Rosaceae taxa will be needed to examine 
the origin and transcription of Adh2-2. Three subpara- 
logs representing Adhl-1 {Adhl-la, Adhl-lb, Adhl-lc) 
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were identified by accession sequencing (Figure 3), 
although Adhl-lc may not be transcribed. It is unknown 
whether similar subparalogs have evolved for Adhl-2 
and Adh2-2. 

Our study revealed that ancient and recent duplica- 
tions led to the complex structure of Adh outparalogs in 
Pyrus and Malus resulting in neofunctionalization, non- 
functionalization and possible subfunctionalization, the 
three common fates of gene duplications. Adh homologs 
in Malus and Pyrus were more complex than those in 
other angiosperms like Paeonia [69], grasses [44] or 
legumes [70], but similar to those in Gossypium [48]. 
Gossypium has at least seven Adh loci of two primary 
lineages in diploid species and the Adh gene family is 
dynamic with pseudogenization and gene elimination. 
Genomic data suggest that almost all angiosperms, per- 
haps even all plant groups, have experienced one to sev- 
eral rounds of polyploidy [71,72]. Though Malus and 
Pyrus accessions used in this study were all diploid (x = 
17), Pyreae taxa with x = 17 are derived from autopoly- 
ploidization of the formerly Spiraeoid ancestors with x = 
9 [30]. This apparently accounts for such complex para- 
logs, and similar Adh gene structures could be imputed 
for other Pyreae taxa with x = 17. 

LEAFY was first found to be a homeotic gene encod- 
ing a transcription regulator for differentiation of the 
floral meristem and flowering time in Arabidopsis and 
was expected to be a single-copy gene in diploid angios- 
perms [73]. In our study, two major lineages, LFY1 and 
LFY2, were recovered in both Pyrus and Malus, as in 
many other Pyreae taxa including the formerly Spiradeae 
taxa with x = 9 [39,41], suggesting gene duplication of 
these two paralogs before diversification of the Rosaceae. 
LFY1 and LFY2 in apple are located on distinct chromo- 
somes and thus are not alleles [74]. In a study including 
the pear cultivars 'Housui' (P. pyrifolia) and 'Barlett' (P. 
communis), the transcriptional patterns of two LEAFY 
homologs differed in developmental stages and tissues, 
and each homolog varied among plant taxa [41]. In our 
study, genus-specific and locus-specific indels were dis- 
covered in coding regions, which would alter the length 
of the corresponding amino acid sequences. These 
might be responsible for the diversification of LEAFY 
gene functions. 

Multiple inparalogs of LFYlint2 and LFY2int2 were 
observed, and their recovery varied among our acces- 
sions (Table 1). LFYlint2-a and LFYlint2-b were 
unequally observed among Pyrus species (Table 1). 
LFYlint2-a is monophyletic with shorter branch lengths 
than LFYlint2-b (Figure 5), suggesting it might be an 
inparalog derived from LFYlint2-b by a recent duplica- 
tion. This would have occurred after the divergence of 
occidental and oriental pears, because only LFYlint2-a 
is not found in occidental pears. Consequently, 



LFYlint2-b was lost during diversification of some 
oriental species, which explains paralog sorting during 
diversification of Pyrus. Three LFYlint2-c sequences in 
two west Asian species (P. amygdaliformis and P. elaea- 
grifolia) formed a separate clade (Figure 5). We suggest 
two possible explanations for their origin: 1) they are 
pseudogenes derived from LFYlint2-b, but have evolved 
more rapidly, and thus are highly divergent from 
LFY2int2-b; 2) they represent another outparalog of 
LFYlint2, derived from gene duplication that occurred 
before diversification of the occidental species, that was 
subsequently lost in some occidental species. However, 
only the second intron of LFYlint2-c was sequenced. To 
help differentiate between these two possibilities the 
entire exonic region must be obtained and its presence 
in more Pyrus species investigated. For LFY2int2> the 
common LFY2int2-N was recovered in all accessions 
but 'Yah' (P. pyrifolia, CWP). LFY2int2-Ins8 and 
LFY2int2-Del2 are inparalogs that originated recently 
after Pyrus diversification and were only recovered in a 
few accessions (Figure 6). LFY2int2-S, with a long inser- 
tion homologous noncoding region of S-RNase gene, 
was similar to functional AFLla copies found in some 
apple cultivars [74]. Genomic Southern analysis also 
showed that apple had other homologues in addition to 
AFL1 and AFL2 [75]. However, the relationships among 
these homologs have not been published. Only a few 
accessions contained LFY2int2-S (Table 1), and only the 
one from P. fauriei was included in our analyses. It is 
unknown how the intron of the RNase gene was 
inserted in the second intron of LFY2 and whether 
LFY2int2-S is functional. Additional research found both 
LFY2int2-S and LFY2int2-N in multiple species, and 
showed the LFY2int2-S sequences were all highly diver- 
gent from LFY2int2-N even after exclusion of the large 
insertion (unpublished data). 

Incongruence and poor resolution 

Paralogs and lineage sorting are two major challenges 
when conducting phylogenetic analyses based on 
LCNGs, because they can lead to incongruent patterns 
similar to those resulting from hybridization and poly- 
ploidization [31]. Paralogs reflect a horizontal event, the 
gene duplication in one species, while orthologs reflect a 
vertical event, the speciation in a lineage [36,76]. Thus, 
it is crucial to differentiate paralogs from orthologs by 
investigating their origins and monophyletic positions in 
the tree. We have clearly identified the outparalogs and 
inparalogs for both Adh and LEAFY genes. If inparalogs 
representing LFYlint2 and LFY2int2 were not identified, 
P. calleryana possessing both LFYlint2-a and LFYlint2- 
b as well as LFY2int2-N and LFY2int2-Del2 would be 
polyphyletic in both gene trees (Figure 5 and 6) and pre- 
sumed to be hybrids involving other Pyrus species. 
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However, P. calleryana is one of the most ancestral spe- 
cies in Pyrus and should not be a hybrid of other Pyrus 
species. Additionally, Pyrus ussuriensis, P. x hopeiensis, 
P. x phaeocarpa and P. hondoensis containing only 
LFYlint2-a fell in the same clade. This shows a close 
relationship (Figure 5). However, P. betuleafolia was not 
in the clade and only contained LFYlint2-b. These find- 
ings were inconsistent with the hypothesis that P. betu- 
leafolia was involved in the origin of P. x hopeiensis and 
P. x phaeocarpa and to the phylogeny based on 
LFY2int2-N. 

Interspecific hybridization has been considered the 
major mode of evolution for Pyrus [52], and LCNG has 
been useful for testing the hypothesis of hybridization, 
since homologs of a nuclear locus from both parents 
could be detected in putative hybrids through cloning 
[31]. Excluding the possibility of paralogs, incongruence 
caused by hybridization and lineage sorting could be dif- 
ferentiated by comparing phylogenies of multiple 
unlinked nuclear loci. Only Adhl-lc and LFY2int2-N 
are shown to be two independent orthologs, and they 
were recovered in all accessions (except LFY2int2-N in 
'Yali'). As described above, relationships revealed by 
Adhl-lc were poorly resolved, and most intra-species 
and intra-individual sequences were polymorphic. As 
shown in Figure 3, intra-individual sequences of P. call- 
eryana, P. hondoensis and P. dimorphophylla were poly- 
phyletic, and the occidental species were not 
monophyletic. However, the putative interspecific 
hybrids, P. x hopeinensis and P. x phaeocarpa, were 
monophyletic, which was incongruent with other gene 
trees and our previous understanding of these species. 
In contrast, only four accessions were polymorphic in 
the LFY2int2-N tree, including P. x phaeocarpa, P. x 
hopeiensis, and 'Korlaxiangli' (P. x sinkiangensis) 
(marked in red in Figure 6), all of which were putative 
interspecific hybrids. Therefore, lineage sorting of ances- 
tral polymorphic Adhl-lc alleles may have occurred 
during diversification of Pyrus, 

The phylogenetic relationships revealed by LFY2int2-N 
were mostly congruent to other orthologous gene trees 
and previous studies based on other data. Close relation- 
ships among P. calleryana, P. dimorphophylla, P. pashia, 
P. pyrifolia, and/ or P. x serrulata were supported by all 
gene trees, suggesting a close relationship among these 
species. Two distinct LFY2int2-N sequences of 'Korlax- 
iangli' (P. sinkiangensis) were grouped with occidental 
species and P. pyrifolia, respectively (Figure 6). Similar 
relationships were found in Adhl-la, Adhl-lc and 
Adhl-2 clades (Figure 3), which supports the hypothesis 
that P. sinkiangensis is an interspecific hybrid involving 
at least P. communis and P. pyrfolia [11]. Intra-indivi- 
dual copies of P. x hopeiensis and P. x phaeocarpa were 
grouped with P. ussuriensis, P. hondoensis or P. 



betulaefolia in both the Adhl-la clade (Figure 3) and 
LFY2int2-N (Figure 6) clade. Pyrus x phaeocarpa was a 
putative hybrid involving P. betulaefolia and P. ussurien- 
sis, and P. x hopeiensis was a hybrid involving P. x 
phaeocarpa and P. ussuriensis. Pyrus hondoensis, which 
was once classified as a variety of P. ussuriensis by mor- 
phological data [3], and P. ussuriensis were found to be 
closely related [12]. Phylogenetic relationships among 
these species were supported by multiple orthologous 
gene data, suggesting ancient hybridization rather than 
lineage sorting. More wild individuals of these species 
are needed to test such complex evolutionary histories. 

The relationships based on all separate orthologs were 
mostly poorly resolved. In our study, different Adh and 
LEAFY paralogs showed a relative low sequence diver- 
gence (< 0.03). LFY2int2-N showed the highest propor- 
tion of informative sites (38/562, 6.8%, Table 4) which 
was similar to results in Neillia and Stephanandra 
(7.4%) [39]. Low sequence divergence of multiple DNA 
regions suggests rapid radiation during divergence [77] 
and this has been hypothesized for many Pyreae taxa 
(the former Maloideae taxa) [26,40]. This may also 
explain the poor resolution of the gene trees. Another 
contribution to the poor resolution in this study is the 
conflicting signals caused by recombinants. Recombi- 
nants are derived from two homologous chromosomes 
in one genome during meiosis (genie recombinants) or 
PCR (artifacts), leading to incorrect phylogenetic infer- 
ences [78-80]. As predicted by statistical principles, we 
found that putative recombinants formed separated 
clades. Recombinants represent substitutions of two dis- 
tinct lineages, and thus receive no bootstrap support 
from either of the lineages in a cladistic phylogeny (data 
not shown). In this study, most putative recombinants 
represent one of the intra-individual polymorphic copies 
and were excluded from analyses. The four LFY2int2-N 
copies of P. xerophila all displayed the characteristics of 
recombinants, and formed separate clades in the tree 
(Figure 6). Polymorphic LFY2int2-N copies in P. xero- 
phila may all be ancient genetic recombinants that arose 
by interspecific hybridization involving both oriental and 
occidental species. More individuals of P. xerophila and 
occidental species are necessary to confirm this hypoth- 
esis and investigate the origin of this species. 

Phylogenetic utility of the introns 

Among three Adhl-1 subparalogs, Adhl-lc was ortholo- 
gous and recovered in all accessions, but it resulted in a 
poorly resolved phylogeny due to lack of informative sites 
and possible lineage sorting (Table 3). This makes it inade- 
quate for the phylogenic reconstruction of Pyrus. The two 
introns of Adhl-2, Adh2-1 and Adh2-2, were not 
sequenced and analyzed in the current study, and it is 
unknown whether multiple subparalogs also exist for 
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these paralogs. Adh.2-2 might not exist in Pyrus species as 
discussed above. The phylogenetic utility of the introns of 
Adh2-1 and Adhl-2 needs to be estimated, which will 
require primers designed specifically to the paralogs. 

LFYlint2 was not suitable for studying interspecific rela- 
tionships due to sorting of LFYlint2-a and LFYlint2-b 
paralogs and the unclear origin of LFYlint2-c. In contrast, 
LFY2int2-N showed the highest sequence divergence, 
resulting in the best-resolved tree. Inparalogs of LFY2int2- 
Ins8 and LFY2int2-Del2, as well as the LFY2int2-S of 
unclear origin, could be easily identified and removed 
from phylogenetic inferences. Most importantly, relation- 
ships based on LFY2int2-N were congruent to previous 
studies based on morphological and molecular marker 
data. Conflicting placement of species may be resolved by 
using LFY2int2-N. It provides reliable evidence of ancient 
hybridization, since incomplete lineage sorting was not 
imputed for LFY2int2-N. Phylogenetic studies of Pyrus 
based on nuclear gene regions have been rare. Only the 
ITS region has been applied to a wide range of East Asian 
Pyrus species, but it resulted in a poorly resolved tree [28]. 
One study based on the 18S gene focused only on two 
species, P. pyrifolia and P. communis [81]. We conclude 
that LFY2int2-N is currently the most useful nuclear gene 
region for phylogenetic inference in Pyrus. It is as yet 
unknown whether additional inparalogs representing 
LFY2int2 will be found by analyzing more occidental spe- 
cies and individuals of oriental species. 

Conclusion 

This is the first study that explores LCNGs for phyloge- 
netic analyses in Pyrus. It is also the first to document 
the gene structures and transcription of Adh homologs 
in the Rosaceae taxa. We demonstrated that frequent 
gene duplications contributed to complex outparalogs 
and inparalogs of Adh genes with functional diversifica- 
tion or nonfunctionalization. Paralogs, lineage sorting of 
alleles, and recombinants are three major problems 
when applying LCNGs in plant phylogenetic analyses. 
One ortholog of LEAFY, LFY2int2-N, is currently the 
best nuclear marker for studying interspecific relation- 
ships of Pyrus. Complex reticulate histories likely com- 
plicate the phylogenetic reconstruction of some Pyrus 
species. To better resolve interspecific relationships and 
examine the evolutionary processes of Pyrus, we are 
extending our phylogenetic studies with plastid DNA 
and nuclear DNA, including LFYint2-N, and by sam- 
pling a wider assortment of species and individuals. 

Methods 

Taxon sampling, DNA extraction, primer design and 
amplification 

Twenty- five accessions from 13 oriental species and four 
occidental species of genus Pyrus were included. Six 



accessions of four Malus species were used as outgroups 
(Table 1). Total genomic DNA was isolated from fresh 
leaf tissue using a modified sodium dodecyl sulfate 
(SDS) method [12,82]. 

To date, a complete cDNA sequence representing 
Adhl from 'Granny Smith' (M. domestica, Z48234) and 
two 5' partial (beginning at the 3' end of exon 2) cDNA 
sequences representing two distinct Adh loci 
(AF031900-Ad/zi, AF031899-A<i/z2) from 'Packham's 
Triumph' (P. communis) are available. The Adh series 
{Adhl and Adh2) were named randomly and do not 
correspond to previously named alleles. A forward pri- 
mer (Adh-Fl) based on sequence of Z48243 and three 
downstream primers (Adhl-Rl, Adhl-R2 and Adh2-R) 
based on AF031900-Ad/zi and AF031899-Ad/z2 were 
designed to obtain the entire gene region in several 
accessions including P. communis, 'Flemish Beauty {P. 
communis), 'Nanguoli' (P. ussuriensis), 'Cuiguan' (P. pyri- 
folia), 'Korlaxiangli' (P. sinkiangensis), 'Rails' (M. domes- 
tica), M. rockii and M. domestica subsp. chinensis. 
However, these primer pairs only succeeded in amplify- 
ing 12 Adhl sequences in select accessions. Therefore, 
two additional forward primers, Adhl-F2 and Adh2-F, 
were designed based on AF : 031900 -Adhl and 
AY031S99 -Adh2, respectively, targeting a partial Adh 
region lacking exon 1 and intron 1. These primers 
amplified Adh2 sequences and additional Adhl 
sequences. All of the above PCR products were desig- 
nated as long partial genomic Adh sequences (G-PCR). 
Considering labor costs and difficulties in amplifying 
and sequencing fragments greater than 2 kb, a smaller 
region covering only introns 2 and 3 (about 650 bp) of 
Adhl (reduced Adhl) was used in all accessions to con- 
struct a phylogeny. 

For LEAFY, a long partial region of LFY1 and LFY2 
spanning exon 2 and intron 2 was first amplified in 
some accessions using the primer pair 'LFY-F+LFY-R' 
developed in an exonic region of M. domestica 'Pinova' 
(DQ535885, DQ535886). After initial sequence analyses, 
specific primer pairs of 'LFY1-F + LFY1-R' and 'LFY2-F 
+ LFY2-R' were developed to amplify partial intron 2 of 
LFY1 and LFY2 (LFYlint2, LFY2int2), respectively. This 
was done independently in all accessions except the 
three commercial cultivars ('Cuiguan', 'Rails', 'Flemish 
Beauty' and 'Fuji'). Sequence information for primers 
used in this study is listed in Table 2 and their locations 
illustrated in Figure 1. 

PCR was carried out in a final reaction volume of 50 
pL, containing 10-20 ng total DNA, 2.5 mM MgCl 2 , 0.4 
l*M of each primer, 5% DMSO (v/v), 0.2 mM dNTP, 2 
U Taq DNA polymerase (Takara Biotechnology Com- 
pany Co., Ltd, Kyoto, Japan) and 1 x PCR buffer sup- 
plied by the manufacturer. Amplification of the long 
partial Adh region was performed for 4 min at 94°C, 
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followed by 35 cycles of 40 s at 94°C, 40 s at 58°C, 2 
min of 20 s at 72°C, and a final extension for 7 min at 
72°C For other shorter regions like locus-specific RT- 
PCR as described below, the PCR procedure was identi- 
cal, but only 1 min was needed for the extension step. 

Cloning and sequencing 

PCR products were verified by 2% agarose gel electro- 
phoresis, and the target bands were separated and puri- 
fied using 3S spin DNA Agarose Gel Purification 
(Shenergy Biocolor, Shanghai, China). For long partial 
genomic Adh amplified by G-PCR, the purified PCR 
products were cloned using TA cloning kit Pmdl9 
(Takara) and more than three clones per sample were 
sequenced using M13 + , M13" primers and internal pri- 
mers located at exonic regions (Adhl-F5 for Adhl, and 
Adh2-F5 for Adh2, Table 2). For the reduced Adhl 
region, five to ten clones per sample were sequenced 
using the M13 + primer. For the long partial LEAFY 
region, more than three clones were sequenced to 
obtain reads representing putative LFY1 and LFY2. For 
LFYlint2 and LFY2int2, the purified PCR products were 
directly sequenced by amplification primers. Addition- 
ally, more than three clones were sequenced to capture 
all the copies indicated by direct sequencing results. 

Sequence analyses 

Intronic and exonic boundaries were determined by 
comparison with available cDNA sequences and preser- 
vations of the 'GT and 'AG' at two ends of introns. 
Sequences were aligned with Clustal X [83]. Sequence 
divergence within and between different homologs was 
calculated using MEGA4 [84] with gaps treated as pair- 
wise deletions. Putative recombinants were detected 
using RDP3 software package [54], and some putative 
recombinants were identified manually. 

Transcription of Adh homologs 

The expression patterns of LFY1 and LFY2 have been 
well documented in Malus and Pyrus [41], but little was 
known about expression patterns of Adhl and Adh2 in 
Pyrus, Therefore, 'Cuiguan' (P. pyrifolia), 'Nanguoli' (P. 
ussuriensis), Korlaxiangli' (P. sinkiangensis) and 'Flemish 
Beauty' {P. communis), representing four major pear cul- 
tivar groups together with 'Fuji' {M. domestica) were 
selected to examine transcription of Adhl and Adh2. 
For 'Cuiguan' (P. pyrifolia), fresh young leaves, ripe 
fruits and seeds were collected in our campus yard for 
Adh expression analyses, while for the other accessions, 
only ripe fruits were used. The plant tissues were frozen 
in liquid nitrogen and stored at -80°C. Total RNA was 
isolated using a modified CTAB method. First strand 
cDNA was synthesized from 1.0 \ig of total RNA using 
a poly (T) 18 as primer and AMY reverse transcriptase 



(Bio Basic Inc, New York, USA) according to the manu- 
facturer's instructions. To efficiently detect transcription 
of Adhl and Adh2 independently, two specific primer 
pairs 'spAdhl-F+spAdhl-R' and < spAdh2-F+spAdh2-R' 
(Table 2) targeting a shorter region were used for RT- 
PCR and specific genomic PCR (SG-PCR). PCR pro- 
ducts were directly sequenced followed by cloning to 
identify copies involved in transcription. Actin was ana- 
lyzed as a reference gene. The primers Pact-F and Pact- 
R were designed based on the Actin gene sequences 
from P. communis and 'Yali' (P. pyrifolia, CWP) 
(AB190176, GU830958) (Table 2). 

Phylogenetic analyses 

The predicted amino acid sequences of long partial Adh 
genes in Pyrus and Malus from G-PCR were compared 
with those from other well-studied plant taxa by con- 
ducting NJ and MP analyses using PAUP 4.0bl0 [85]. 
For reduced Adhl, nucleotide sequences including the 
exonic and intronic regions were both included. 
LFYlint2 and LFY2int2 were analyzed separately, since 
their sequence homology was too low to be aligned. MP 
analyses were conducted using PAUP 4.0b 10 with gaps 
treated as missing data. MP analyses were performed 
using a heuristic search with the TBR and Multree 
options. To estimate support for the clades, non-para- 
metric bootstraps were estimated with 1000 replicates. 
Bayesian analyses were performed with MrBayes 3.1 
[86]. The best fitting substitution models for each data- 
set were determined with the Akaike Information Cri- 
terion (AIC) using ModelTest 3.06 [87]. The AIC 
favored the HKY+G for the reduced Adhl and the 
K81uf+Gfor both LFYlint2 and LFY2int2. Markov 
chains were run for 10,000,000 generations with sample 
frequency of 100. The average standard deviation of 
split frequency was 0.003 for reduced Adhl, 0.005 for 
LFYlint2 and 0.002 for LFY2int2, indicating the runs 
have reached convergence for each dataset. The first 
25% of the trees were discarded as burn-in. Clade pos- 
terior probabilities were calculated from the combined 
sets of trees. Both MP and Bayesian analyses resulted in 
largely congruent tree topologies. Sequences included in 
the final phylogenetic analyses were deposited in Gen- 
Bank (Accessions GU991401-991522, HM003976- 
004066, HQ912028-HQ912076). Alignments of these 
datasets are deposited as additional files 4, 5, 6 and 7. 

Additional material 



Additional file 1: 50% majority-rule consensus tree based on amino 
acid sequences of Adh loci from diverse plant taxa. ADH sequences 
in Rosaceae are highlighted in blue. Numbers above the branches or 
near the branch nodes indicate bootstrap values (1000 replicates). 
Accession number was given for sequences from GenBank. Multiple 
intraindividual sequences for Adhl (Adhl-1 and Adhl -2) and Adh2 (Adh2- 
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7 and Adh2-2) are differentiated by the number in the brackets following 
the taxa name*: Though Adh2-1 was not obtained by G-PCR in Malus, its 
transcription was detected by RT-PCR in 'Fuji' (M domestica), which was 
described in the text. 

Additional file 2: Transcription of Ad hi homologs revealed by 
neighbor joining (NJ) analyses. Sequences obtained from genomic 
PCR are marked by G followed by the corresponding Adhl-1 subparalogs 
name in the square brackets. Sequences obtained from RT-PCR are 
marked by RT followed by the plant tissues used in parenthesis. 
Sequences obtained from specific genomic PCR are marked by SG. 
Multiple intraindividual sequences obtained from different PCR or plant 
tissues are differentiated by the fraction in the brackets following the 
taxa name. Putative pseudogenes obtained by G-PCR are marked by '\|/'. 

Additional file 3: Transcription of Adh2 homologs revealed by 
neighbor joining (NJ) analyses. Sequences obtained from genomic 
PCR are marked by G followed by the paralogs name in the square 
brackets. Sequences obtained from RT-PCR are marked by RT followed by 
the plant tissues used in parenthesis. Sequences obtained from specific 
genomic PCR are marked by SG. Multiple intraindividual sequences 
obtained from different PCR or plant tissues are differentiated by the 
fraction in the brackets following the taxa name. Putative pseudogenes 
obtained by G-PCR are marked by '\|/. 

Additional file 4: Alignment of ADH amino acid sequences from 
different plant taxa. 

Additional file 5: Alignment of the reduced Adhl nucleotide 
sequences from Malus and Pyrus. 

Additional file 6: Alignment of LFY1int2 nucleotide sequences from 
Malus and Pyrus. 

Additional file 7: Alignment of LFY2int2 sequences from different 
plant taxa. 
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